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ABSTRACT 

In this paper we consider the numerical solution of stiff initial value problems, which lead 
to the problem of solving large systems of mildly nonlinear equations. For many problems 
derived from engineering and science, a solution is possible only with methods derived from 
iterative linear equation solvers. A common approach to solving the nonlinear equations is 
to employ an approximate solution obtained from an explicit method. In this paper we shall 
examine the error to determine how it is distributed among the stiff and non-stiff components, 
which bears on the choice of an iterative method. Our conclusion is that error is (roughly) 
uniformly distributed, a fact that suggests the Chebyshev method (and the accompanying 
Manteuffel adaptive parameter algorithm). We describe this method, also commenting on 
Richardson’s method and its advantages for large problems. We then apply Richardson’s 
method and the Chebyshev method with the Manteuffel algorithm to the solution of the 
nonlinear equations by Newton’s method. 
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1 Introduction 


The numerical solution of initial value problems requires at each time-step the solution of an 
implicit nonlineax equation, usually by a Newton-like iteration. As a result it is necessary to 
solve a system of linear equations, which is often large and sparse. It has been customary 
to use direct methods, but in recent years studies on the suitability of iterative methods 
[18, 9, 3, 1] have been emerging. 

This paper examines some issues in the application of iterative methods to initial value 
problems (IVPs). First, we identify more clearly the nature of the task to be performed by 
the iterative solver. A number of comments have been made about the question of whether 
the predictor error lies primarily in the dominant subspace. These comments have not been 
convincingly supported by analysis. The difficulty with the analysis of stiff equations is that 
of knowing which quantities are large and which are small — these things change during the 
course of the integration. We believe we have found a satisfactory way of dealing with this 
question and that is to regard as small those quantities which the error control mechanisms 
of the algorithm make small. Second, we explain why the Chebyshev method might be well 
suited as an inner iteration to solve the linear equations arising at each step of the Newton 
step. In addition we consider an application of the first order Chebyshev method to the 
outer, Newton iteration. (The first order Chebyshev method is Richardson’s method with 
Chebyshev parameters.) 

This paper is a preliminary theoretical study; the accuracy and usefulness of our obser- 
vations await experimental confirmation. 

1.1 Outline of the Paper 

In §2, the standard time-stepping approach for solving linear and nonlinear stiff IVPs is 
described in a simplified way. When the IVP is nonlinear, a variant of Newton’s method is 
employed at each time step, usually the modified Newton’s method in which the Jacobian 
is not always up-to-date. (It will be convenient to say Newton’s method, however, rather 
than either modified Newton’s method or, as is also used, the modified inexact Newton’s 
method.) Each Newton step requires the solution of a linear system for which the matrix is a 
Jacobian, either the current Jacobian or one saved from a previous Newton step. (Matrix-free 
methods avoid explicit computation of the Jacobian but from time to time the Jacobian will 
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be referred to as though it had been computed and is available.) Assume that an iterative 
method is used to solve the linear system. This is an inner iteration whereas Newton’s 
method is an outer iteration in a combined inner-outer iteration. 

Iterative methods differ in the ways in which the error is reduced. In §3, the error in the 
predictor step is analyzed. The predictor step may be thought of as the initial guess for the 
inner iteration, leading to the discussion in §4 of appropriate (inner) iterative methods for 
reducing the error in the predictor. 

In §5, an application of techniques employed in the Chebyshev iterative method is made 
to the outer iteration to give more control over convergence. 

1.2 Terminology and Notation 

It is customary to discuss the solution of IVPs only in the real case. However, to take into 
account the importance of complex IVPs, we use the more general and more appropriate 
terms hermitian and nonhermitian rather than symmetric and nonsymmetric. 

The Jacobian of functions / and F will be denoted by /' and F' respectively. 

The computed approximation to the solution of an I VP at step n will be denoted by y n . 
Usually, y n is obtained by approximately solving a nonlinear equation, the exact solution of 
which, when clarity is required, will be denoted by y*. 

2 Nonlinear Iteration in IVPs 

Assume a transient problem y'(t ) = /(y(t )) with an appropriate set of initial values. (For 
convenience, we assume the system is expressed in autonomous form.) If the equation is stiff, 
the standard numerical solution methods, based on Newton’s method, yield linear systems to 
be solved. We illustrate with backward Euler. There is little reason to believe that the ideas 
do not apply to other implicit difference schemes. For backward Euler, the approximation, 
y n , to y(t n ) at t n is the computed solution of 

Vn = 2/n-i + Kf{y* n ) (1) 

This is an implicit equation for the unknown y*. Rearranging (1) slightly, it is necessary to 
solve an equation of the form 

Hv'J ■= - /( y-j - = o (2) 

h n 
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The solution of (2) by Newton’s method requires the solution of the linear systems 

F , (vir ) )(.vt+ l) - vi m) ) = -nvtr'y (3) 


Customarily using a direct method one iterates with 

yi m+1) = yi m) - cG-'FiyW) (4) 

A 

where G is a Jacobian of F kept either from an old time step or from an earlier iteration 
and c is an acceleration parameter. Since G is a matrix, it follows that computing the vector 
x = G~ 1 F(y^) is equivalent to solving a set of linear equations, 

Gx = F(ylT>). (5) 

The solution of the nonlinear equation (2) is called the corrector step, and the initial 
approximation of yft := y% obtained from an explicit multistep formula is called the predictor 
step. 

One could instead approximate the solution of (3) by means of a matrix-free iterative 
method. Such methods require only that we be able to compute the action of the Jacobian 
matrix F'(yW)v for an arbitrary vector v. This can be accurately approximated by 

I [f(y(") + fo) - F(j,<">)] 

for some small S of the order of the square root of the machine epsilon. Note that F(yW) 
is already available, because it is the right-hand side of the linear system (3). Hence each 
iteration of a typical linear iterative solver requires but one function evaluation. The stopping 
criterion for the inner linear iteration would be based on that of the outer Newton iteration. 

This question of when to stop the inner iteration is not easy: one would like to avoid 
unnecessary inner iterations without degrading the convergence of the outer iteration. Hence 
it has been proposed [4] that a single nonlinear iteration would be more efficient than a 
two-level iteration. However the one-level iteration proposed in [4] requires two function 
evaluations per iteration; whereas the two-level iteration requires one function evaluation 
per inner iteration and one per outer iteration. 

Currently it seems more practical to use iterative methods to solve (5) if there is some 
preconditioning, although this usually requires that an explicit matrix be available. 
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3 Accuracy Needed 


Much discussion has focused on this objective of the nonlinear solution to impose stability 
[9, 3]. In this section, we present some analysis to show what components of the error should 
be reduced by an iterative method. 

Consider now the backward Euler formula 

Vn = Vn-l + hnfiVn) + h n F n (6) 

where the residual (per unit step) F n := F(y n ) would be zero if the nonlinear system were 
solved exactly. 

One question concerns the nature of the error in the predictor. In practice, the predictor 
is usually 

Vn = J/n— 1 "t" ^nj/n— l,n— 2 (7) 

where the double subscript denotes a first divided difference. (Often this is written y* = 
Vn-\ + ^uVn-i where y' n -i is chosen, not to satisfy y' n _\ = f(yn- 1 ), hut rather to satisfy 

the formula used on the (n — l)-st step. If this formula is backward Euler, then y n -i = 

Vn -2 + ^n-i2/n_i determines y' n _ x .) The residual F% := F(y*) for the predicted value satisfies 

til = *-i + h n f(y$) + h n FZ (8) 

and the nature of this residual depends on how the stepsize is controlled. It is customary to 
control the stepsize h n - 1 so that ||/e n _i|| < e and to choose hn so that r^||Ie„_i|| < (safety 
factor) e, where e is the local error tolerance, Ze n _i is a local error estimate, and r n is the 
stepsize ratio h n /h n ^i. For the local error estimate, it is common to use 

= hnyn,n—l,n—2 ( 9 ) 

where the triple subscript denotes a second divided difference. What is wanted is an expres- 
sion for the predictor residual which, as much as possible, is in terms of quantities known to 
be small. It will be shown at the end of this section that 


F% = (1 + r n )F n -i — r n F n ~2 


1 


— t— r^(l + r n \)le n _x + nonlinear term 

h n 

where the nonlinear term is given in a later paragraph. 


( 10 ) 
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Let y * be the exact solution of F(y*) = 0. Codes attempt to control the error, S n = 
y„ — y n , in the solution typically so that it is no greater than some fraction <f> of the local 
error tolerance: 

||£n|| < <M hopefully). 

In practice, an approximation for the error is necessary, and usually this is a scalar multiple 
of — y^ . The widely distributed package DASSL[20] uses <f> = 1/3. We now want 

to look at (10) to determine what the iterative linear equation solver must do. To simplify 
this description, neglect the nonlinear term as well as variations in the stepsize and in the 
Jacobian matrix f of /. Under these assumptions the error $„ in solving for y n satisfies 

and the predictor error satisfies 

= 2<„.i - S n -2 +2(1 - /./O' 1 1. (11) 

The conclusion is that the nonstiff error must be reduced by as much as 1/(3 + 2 <f> _1 ) (set 
hf = 0) and the stiff error by as much as 1/3 (set hf = — oo). Trying to control the error 
in an iteration based on the size of the correction is tricky unless the convergence is rapid; 
thus, DASSL requires a convergence factor not greater than 0.9. Therefore, for an iterative 
method one might want to consider controlling the residual instead [25, 3]. The price for 
this is that instead of reducing the error by 1/(3 + in just the nonstiff components, 
the more severe reduction must be done for all components. In addition, [5] argues that the 
possible skewness of the eigensystem of f can make it risky to control the residual. 

One might consider loosening up on the fraction </>; however, the stability of the method 
[16] and the reliability of the local error estimation depends on having an accurate solution 
of the nonlinear system. Nonetheless, this matter should be re-examined because of its 
importance to the efficient use of iterative (non)linear equation solvers. 

It remains to discuss the nonlinear term in (10), which can be shown to be 

nonlinear term = ^/£(1 + r“ 1 )(/ w )„-iyn-i, n - 2 l/n-i , n -2 

where (f ") n - i is some average value of f". It seems quite possible that this could be a 
significant part of the stiff error for some problems. It can be shown that the contribution of 
the nonlinear term to the residual can be computed at a cost of one function evaluation (and 
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estimated for practically nothing). It is a question whether the nonlinear error has a special 
structure, and whether stepsize control should ensure that the nonlinear error is small. 

Recently, a different stepsize control strategy has been proposed in [14], which for our 
example would mean using & n y„ in _i for the local error. This is consistent with current 
practice in nonstiff solvers where the stepsize is controlled for a formula that is one order 
lower than that actually used. The foregoing analysis of the predictor error simplifies in this 
case. 

We conclude this section by deriving (10). Using Eqs. (6 n _ 2 ) 2 , (6 n _i), and (8„) to 
determine F n _ 2, F n - 1, and F% with y p eliminated by Eq. (7), we get 

F n p - F n _ x = z(t n ) - z(t n _0 (12) 

and 

F n — X F n _2 = ^(^71-2) (2/n— l,n— 2 Vn- 2,n— 3) (13) 

where 

z{t) - f(Vn- 1 + (t~ <n-l)yn-X,n-2)- 

Hence, using (12) and (13) to create second divided differences, we get 

Fn,n- l,n-2 = Z \Pnt tn-1 j ^n- 2 ] (^n 4" ^n-l) (1 4 7 'n-l)2/n-l,n-2,n-3- 

We can express the divided difference 

z \bni tn-ly tn-2] = f K(t)z (t) dt 

Jtn-l 

where the Peano kernel K(t) is the hat function for t — f n _i divided by h n 4 h n _i. If we 
define 

(/w) n_1 = 2 / K{t)fyy{y n -\ 4 (t — ^n-l)l/n-l, n - 2 ) dt 
and use (9), we get 

Fn,n-l,n-2 = '^fyy)n-\yn-\,n-2yn-\,n-2 ~ (h n 4 ^n-l) (1 4 r n-l)^n-l^ e n-l 

from which (10) follows. 

2 The notation means that in (6, n is replaced by n - 2, and similarly in the next two references. 
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4 Iterative Linear Solution Methods 


In this section, iterative methods will be sketched, focusing on the Chebyshev method and 
Richardson’s method. Technicalities will not be emphasized. 

4.1 Krylov Subspace Methods 

Let Ax = b be a linear set such as in (3). Let ®(°) be an initial guess and := b — Ax 
be the initial residual. The iterative methods that will be considered here are such that the 
solution approximation is of the form x W = x W +y, where y G Vi := span{r( 0 ), . . . , A*“M 0 )}, 
and Vi is the Krylov subspace. Such methods are called Krylov subspace methods. (Another 
widely used term for Krylov subspace methods is polynomial methods .) They include and 
unify a large class of methods, such as the conjugate gradient method, and for nonhermitian 
matrices, the adaptive Chebyshev iteration of Manteuffel [17], and Richardson’s method 
together with GMRES [21], ORTHOMIN and other conjugate gradient-like methods. 

4.2 The Residual Polynomial 

Let ®W be the ith approximation resulting from an iterative method. Also let = x — ®W 
and rW = b — Ax^ be the error and residual respectively. For a Krylov subspace method, 
the error eM is related to the initial error by a residual polynomial Ri defined by eW = 
Ri(A)e(°\ The name results from the fact that the same polynomial propagates the residual, 
rW = Ri(A)A°\ (The GMRES and ORTHOMIN methods are restart methods for which 
e(°) may be the initial error due not to the initial approximation of Ax = b but to some 
later approximation. Also the adaptive Chebyshev method of Manteuffel [17] is a restart 
method.) 

4.3 Types of Acceleration 

Methods differ in the way that the residual polynomial is determined. Thus, in conjugate- 
gradient-like methods the residual polynomial minimizes a certain weighted norm of the error. 
In the Chebyshev method the polynomial is chosen as that polynomial minimizing a uniform 
norm over an ellipse among all residual polynomials of a fixed degree and turns out to be 
a scaled and translated Chebyshev polynomial. There are practical differences among these 
methods of course. Conjugate-gradient-like methods determine their residual polynomial 
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in an automatic way whereas the Chebyshev method requires a set of parameters, which, 
nevertheless, can be computed in an effectively automatic way by means of the Manteuffel 
algorithm [17]. 

The Manteuffel algorithm chooses parameters based on an estimate of the convex hull 3 
of the eigenvalues, beginning, for example, with a point known to be inside the convex hull. 
After every 20 or so iterations the convex hull is expanded (and the parameters recomputed) 
to include estimates of several stray eigenvalues. The combination of the Chebyshev method 
with the Manteuffel algorithm will be referred to as the adaptive Chebyshev method (of 
Manteuffel). 

One advantage of the adaptive Chebyshev method is as follows. The solution of stiff 
IVPs yields a succession of linear systems to solve, each one often not much different from 
the previously solved problem. This is certainly true for the linear systems arising from 
successive Newton iterations. From one time-step to the next there may be a significant 
change in A = j-I — /' due to changes in h, but it is straightforward, in the absence of 
preconditioning, to calculate what this does to the eigenvalues of the matrix A. The idea 
then is that eigenvalue estimates from one linear system can be used as initial estimates 
for the next system. However, because the algorithm works only by expanding the convex 
hull approximation, it would be important to begin the solution of a new linear system by 
suitably shrinking the final convex hull from the previous system so that it lies inside the 
true convex hull. 

4.4 Richardson’s Method 

The Chebyshev method minimizes the residual polynomial over an ellipse, which will be 
called the Manteuffel ellipse , containing the convex hull (approximately) of the spectrum of 
the system matrix A. If the convex hull is not well-approximated by the enclosing Manteuffel 
ellipse, then one might want to consider an iterative method for which the residual polynomial 
is minimized over the convex hull. If the convex hull is not elliptical a second order iteration 
is not, in general, possible. (A second order iteration is one for which the iterate is optimum 
at each step; the Chebyshev iteration is an example. Second order iterations require that 
the residual polynomial be generated by a three term recursion. In general a three term 
recursion does not exist.) However, a first order method is always possible. A first order 
3 The convex hull of a set of points is the intersection of the convex sets containing the set of points. 
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method is often called Richardson's method. (For a statement of Richardson’s method in 
algorithm form, which is not necessary for this discussion, see [13].) 

It should be noted that Richardson’s method does not require working with a convex 
hull; a general nonconvex set containing the eigenvalues may be used instead. A reason for 
using the convex hull is that it may be computed from known procedures, whereas there is 
no known, general procedure for computing a non-convex region containing the spectrum. 

If Richardson’s method is used, the parameter difficulty is much the same as for the 
adaptive Chebyshev algorithm of Manteuffel. In fact if the residual polynomial is a scaled 
and translated Chebyshev polynomial, then Richardson’s method is just a first order imple- 
mentation of the second order Chebyshev method used in the Manteuffel algorithm. 

In [23], an adaptive algorithm is presented for Richardson’s method that determines the 
convex hull of the spectrum in a way analogous to the Manteuffel algorithm. It should be 
noted that the adaptive method in [23] is a combination of two iterative styles, one from the 
GMRES method, used both to advance the solution and compute an approximate (convex 
hull of the) spectrum, and the other from Richardson’s method. This combination was 
suggested in [6]. 

One final note on Richardson’s method: The simplicity of the method is an advantage 
on advanced processors when organizing the computations to minimize data traffic [22]. 

4.4.1 Minimizing the L 2 Norm of Residual Polynomials 

If the convex hull is determined, then one can determine the residual polynomial in an 
optimum way to satisfy a weighted L 2 norm induced from the inner product, 

(«,*)» = (I*) 

where T is a contour in the complex plane, the boundary of the convex hull of the spectrum 
in the application to the Richardson’s method parameter problem. 

The L 2 optimal residual polynomial of degree k is defined to be the solution of the least 
squares problem (i2*, Rk)w = minimum. There are methods [24] to compute the roots of this 
polynomial related to the stable algorithm of Golub and Welsch for the computation of nodes 
and weights for Gaussian quadrature [11]. The reciprocals of the roots of the polynomial 
then become the parameters required for Richardson’s method. 
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4.4.2 Minimizing the Uniform Norm of Residual Polynomials 

In the literature recently there have been ideas developed for the practical computation of 
Richardson’s method parameters for which the residual polynomial is approximately min- 
imized in the uniform norm [7, 8, 19, 26]. These methods use the theory of conformal 
mapping and the properties of Faber polynomials. The resulting residual polynomial may 
be called here the Loo optimal residual polynomial. An important contribution from [8, 26] 
is an algorithm for ordering the parameters for Richardson’s method to ensure stability. 

In the adaptive algorithm in [23], an L 2 optimal residual polynomial is computed and 
used but this is not a restriction and the Loo optimal residual polynomial could be used 
instead. 

4.5 The Appropriate Iterative Method for IVPs 

The nature of the error reduction needed to solve the IVP was briefly explained in §3. 
Equation (11) shows that 

where A = I — hf, He'll < 3 <j>e and ||e"|| < 2e. From this expression, one sees that the 
nonstiff error must be reduced by as mqch as 1/(3 + 2 and the stiff error by as much as 
1/3. A simplified and satisfactory conclusion to draw from this is that the stiff and nonstiff 
errors should be damped uniformly. Thus if the spectrum is well approximated by an ellipse, 
the adaptive Chebyshev method of Manteuffel is reasonable. If it is not, one may want to 
consider Richardson’s method. If the residual polynomial is determined by minimizing the 
L 2 norm of the residual polynomial derived from the inner product (14), then a reasonable 
choice for the weight is 

w(X) = 1. 

It should be noted that this is not the Chebyshev weight function, which is the weight 
function that should be selected if the convex hull is an interval (of positive numbers), and 
so causes the L 2 norm to behave like the uniform norm only approximately. 

In using either the conjugate gradient method in the hermitian positive definite case 
or conjugate gradient-like methods in the nonhermitian case a weight is imposed on eigen- 
components of the error equal to the magnitude of the corresponding eigenvalue. As argued 
here, such a weight function is not appropriate. 


10 


5 Accelerating the Newton Iteration 

From §2, the Newton iteration is, for m > 0, 

y£ m+1) = y£ m) - cG-'FiyW) (15) 

where c is an acceleration parameter and 

F(v) := f y - f(y) - rV"-'- («) 

n n rln 

(Recall that the solution of F(y) = 0 is denoted by y = y*.) A sequence of acceleration 
parameters will be derived by interpreting iteration (15) as a preconditioned Richardson’s 
iteration. With an assumption of linearity, acceleration parameters may then be derived by a 
straightforward application of standard adaptive iteration parameter techniques [17, 13, 23]. 
Other approaches are possible, for example, [15]. An extended treatment of the solution of 
nonlinear equation by Krylov subspace methods is given in [2]. 

5.1 Krylov Subspace from a Nonlinear Operator 

A - 

A subspace will be defined that is generated by the preconditioned nonlinear operator, G~ X F, 
appearing in Newton’s method. A linear approximation to the nonlinear operator allows 
the subspace to be interpreted as a Krylov subspace. In turn, the Krylov subspace yields 
acceleration parameters based on Chebyshev polynomials. 

5.1.1 Linear Approximation 

Let M be the Jacobian of G~ X F evaluated at M = Cf~ 1 F , (y^). Matrix M is only an 
aid to exposition, and, of course, is never computed. The true assumption is not that the 
Jacobian is evaluated at y ^ but that the Jacobian is slowly varying during some portion of 
the Newton iteration. If so, then the point at which M is said to be the Jacobian of G~ l F 
is arbitrary, and, in particular, the choice y^ is arbitrary. 

Since G is the Jacobian of F evaluated at some y„ 7* \ it would follow that M ^ I unless 
2/^7 ^ = y£\ Therefore, assume that ^ ^ yj^K 

5.1.2 Newton’s Method as Richardson’s Method 

Given an approximation y$™) to the solution of G~ 1 F(y„ ) = 0, let 

r (”») = -G~ 1 F(y^ n )) 
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be the residual due to y Equation (15) becomes 

d m) = 2/i ro_1) + 

Let 

Subtract (17) from y* = y* to obtain 

e ("») _ gC"*- 1 ) _ cr^ m-1 ^. 

Next, since 

r ( m ) = _ G- 1 2P(yi m) ) » Me (m) 

it follows from (18) that 

e ( m > « (I - cM)e( m ~ 1 K 

Therefore 

e<"*> « (7 - cM) m eP>. 

Multiply (20) by M to obtain 

r (m) » (7 - cM) m r (0) . 


(17) 

(18) 

(19) 

( 20 ) 
( 21 ) 


5.1.3 Krylov Subspace 

As the Newton iteration proceeds, a Krylov subspace, V*., is defined by 7 — cM 

V k := span{rW, ...,(/- cM)*-V°>}. 

Although Vit is not computed, the subspace 

V k := span{rW}^ 


is available and 




V*. 


Computations requiring V k can be performed using V*. 
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5.2 Chebyshev Acceleration Parameters 

From the k vectors in V*, k — 1 eigenvalues of M may be estimated. (It should be noted 
that in practice, fc is a small number; in the code DASSL, the total number of Newton 
steps is limited to 4.) From these eigenvalues, the convex hull of the spectrum of / — cM 
can be computed, which yields [17] two parameters d and c e m pse from which a sequence 
of acceleration parameters c may be computed. Parameter d is the center of a family of 
confocal ellipses for which the foci are d± c e ui pse . The Manteuffel algorithm determines these 
parameters in order to optimize convergence of the Chebyshev method. 

The d and c e ui P se parameters yield a sequence of Chebyshev parameters to use as follows. 
Let k be the total number of Newton steps to be executed. Of course k is not known; 
however, for the moment, assume that it is. Let c rn ^\ ) m = k + 1 , . . . , l < « be the 
reciprocals of the roots of the polynomial 

C,_ fc (A) = T,_* (— ) 

\ ^ellipse / 

where Tj is the Chebyshev polynomial of degree j. 

In place of (17), execute 

vi m) = »$ (22) 

for m — k + 1, . . • 

There is an obvious difficulty to resolve: the number l must be determined in advance, 
and must satisfy l < k, where k is unknown. One approach is this. Choose some fixed l and 
compute the Chebyshev parameters. It is not expected that l = k so that either there are too 
few parameters Cm to get to Newton step « or there are too many. If there are not enough, 
then recycle the old, computed parameters c™, in which case eventually either there will be 
unused parameters c m or all parameters will have been used. Thus, plans must be made 
for unused parameters. To prepare for too many parameters, order Cfe, . . . ,cj_! by either of 
the methods of [26] or [8], which exploit the known mapping of Chebyshev polynomial roots 
onto the unit circle to obtain an ordering for which, heuristically, ||r( m ^|| < || r ( m - 1 )||. (These 
ordering techniques are more general, however, and are not restricted to the Chebyshev 
case.) With this ordering one may reasonably choose l — k = 4, even though this number of 
parameters is not likely to be reached within a single time step by a solver such as DASSL. 
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Another approach to the problem of excessive parameters is to use the second order 
formulation of the Chebyshev iteration. The following is an adaptation of the algorithm in 
[17]. Recall that the residual r W is F(yW). 

1. Set ajfc + i := 1/d. 

2. Set A yW := a k +i r (fc) . 

3. Set 2 /l fc+1 > := y£*> + A y£*> 

4. Set r( fc+1 ) := F(y£* +1 >). 

5. Do for m = k + 2 by 1 until convergence: 

5.1 Set a m := l/([d - c^ mp ]a m _i/4). 

5.2 Set 7 m := da m — 1. 

5.3 Set Aj := 7 mA y^~ 2 ) + 

5.4 Set yW := y ^ + Ay£ m-1) - 

5.5 Set r( m ) := F(y£ m >). 

Enddo 

For another application of Richardson’s method to nonlinear problems, see [10, 12]. 

5.3 Changing the Stepsize 

Changing the stepsize causes a potentially large change in the Jacobian of F. The effect of 
this on the Newton’s method Chebyshev parameters will be estimated under the assumption 
that no preconditioning was used at step n to solve the linear systems with matrix G. 
Additional assumptions are (i) the matrix G does not change; (ii) the solution of the linear 
systems at step n with matrix G has been by the Manteuffel adaptive Chebyshev algorithm, 
which yields the convex hull of G\ (iii) G was not preconditioned during the Chebyshev 
iteration; and (iv) that the eigenvectors of F'(y\°li) and F'(y^ are approximately the 
same. 

5.3.1 Operator Approximation at Step n + 1 

We outline a technique to apply the work to compute the parameters at step n to the 
computation of the parameters at step n + 1. 

At this point it is convenient to denote the dependence of F and M on n by F n and M n . 
The acceleration parameters are determined by the convex hull of approximations to 


! 
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eigenvalues of 


(23) 


= d-'KtibSh)- 

Consider the expression on the right. Recall that 

6 = h-fi - 

The other factor on the right of (23) is 


Therefore, 


^n+l — 





(24) 


5.3.2 The Convex Hull at Step n + 1 

The convex hull of M„+i in (24) may be estimated in terms of known quantities as follows. 
It is assumed that 

/'(!$,) « (25) 

Let 

1 1 

Mn = T T- 

lin+l n, T 

From (25) and (26), (24) becomes 

M n+1 » [h'^I - f'(y i?' } )] _1 [fi n I + h~ x I - f'(yW)\ (27) 

= n n G~ x + M n . (28) 

The convex hull of M n + 1 is needed. The eigenvectors of f'(y^) and f'{y^™ ^) are assumed 
to be approximately the same. It follows that the eigenvectors of M n and G are approximately 
the same. Therefore, if the convex hull of each matrix on the right of (28) were known then 
an approximate convex hull of the sum could be easily computed. The convex hull of M n 
is assumed known from the preceding time step. It remains to determine the convex hull of 
G~ l approximately. 

The adaptive Chebyshev method was assumed to have been used to solve the linear 

systems at the preceding step, n, for which the matrix is G. 4 The convex hull of G is 

therefore known. An approximation to the convex hull of G~ l is therefore easily obtained. 

4 Note that this application of the adaptive Chebyshev method is in the execution of the inner iteration 
rather than the outer Newton iteration. 
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6 Summary 


In this paper, a simplified analysis has been presented for the predictor error. It suggests 
that when using An iterative method for the inner iteration (with Newton’s method or a 
variant of Newton’s method as the outer iteration), the iterative method should be one for 
which components of the error are damped approximately uniformly. Two classes of iterative 
methods are sketched for which the error is damped in a uniform way, the adaptive Chebyshev 
method of Manteuffel and Richardson’s method. Richardson’s method is preferable if the 
spectrum of the inner iteration matrix is not well approximated by an ellipse. 

If the (modified) Newton’s method step is viewed as one step of Richardson’s method, 
the usual techniques for computing iteration parameters adaptively may be used to enhance 
convergence of Newton’s method with a parameter sequence. In the case when no precondi- 
tioning is used for the inner iteration, the parameters may be recomputed at low cost when 
the stepsize changes. 
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